Sentiment Classification with Graph Co-Regularization
نویسندگان
چکیده
Sentiment classification aims to automatically predict sentiment polarity (e.g., positive or negative) of user-generated sentiment data (e.g., reviews, blogs). To obtain sentiment classification with high accuracy, supervised techniques require a large amount of manually labeled data. The labeling work can be time-consuming and expensive, which makes unsupervised (or semisupervised) sentiment analysis essential for this application. In this paper, we propose a novel algorithm, called graph co-regularized non-negative matrix tri-factorization (GNMTF), from the geometric perspective. GNMTF assumes that if two words (or documents) are sufficiently close to each other, they tend to share the same sentiment polarity. To achieve this, we encode the geometric information by constructing the nearest neighbor graphs, in conjunction with a nonnegative matrix tri-factorization framework. We derive an efficient algorithm for learning the factorization, analyze its complexity, and provide proof of convergence. Our empirical study on two open data sets validates that GNMTF can consistently improve the sentiment classification accuracy in comparison to the state-of-the-art methods.
منابع مشابه
Marginalized Denoising Autoencoder via Graph Regularization for Domain Adaptation
Domain adaptation, which aims to learn domain-invariant features for sentiment classification, has received increasing attention. The underlying rationality of domain adaptation is that the involved domains share some common latent factors. Recently neural network based on Stacked Denoising Auto-Encoders (SDA) and its marginalized version (mSDA) have shown promising results on learning domain-i...
متن کاملBuilding and exploiting a French corpus for sentiment analysis (Construction et exploitation d'un corpus français pour l'analyse de sentiment) [in French]
Building and exploiting a French corpus for sentiment analysis This work introduces a French corpus for sentiment analysis. We describe the construction and organization of the corpus. We then apply machine learning techniques to automatically predict whether a text is positive or negative (the opinion classification task). Two techniques are used : logistic regression and classification based ...
متن کاملIdentifying Sentiment Words Using an Optimization Model with L1 Regularization
Sentiment word identification is a fundamental work in numerous applications of sentiment analysis and opinion mining, such as review mining, opinion holder finding, and twitter classification. In this paper, we propose an optimization model with L1 regularization, called ISOMER, for identifying the sentiment words from the corpus. Our model can employ both seed words and documents with sentime...
متن کاملMicroblog Sentiment Classification with Contextual Knowledge Regularization
Microblog sentiment classification is an important research topic which has wide applications in both academia and industry. Because microblog messages are short, noisy and contain masses of acronyms and informal words, microblog sentiment classification is a very challenging task. Fortunately, collectively the contextual information about these idiosyncratic words provide knowledge about their...
متن کاملA Semi-supervised Method for Multimodal Classification of Consumer Videos
In large databases, the lack of labeled training data leads to major difficulties in classification. Semi-supervised algorithms are employed to suppress this problem. Video databases are the epitome for such a scenario. Fortunately, graph-based methods have shown to form promising platforms for Semi-supervised video classification. Based on multimodal characteristics of video data, different fe...
متن کامل